7 research outputs found

    Algorithms and architectures for MCMC acceleration in FPGAs

    Get PDF
    Markov Chain Monte Carlo (MCMC) is a family of stochastic algorithms which are used to draw random samples from arbitrary probability distributions. This task is necessary to solve a variety of problems in Bayesian modelling, e.g. prediction and model comparison, making MCMC a fundamental tool in modern statistics. Nevertheless, due to the increasing complexity of Bayesian models, the explosion in the amount of data they need to handle and the computational intensity of many MCMC algorithms, performing MCMC-based inference is often impractical in real applications. This thesis tackles this computational problem by proposing Field Programmable Gate Array (FPGA) architectures for accelerating MCMC and by designing novel MCMC algorithms and optimization methodologies which are tailored for FPGA implementation. The contributions of this work include: 1) An FPGA architecture for the Population-based MCMC algorithm, along with two modified versions of the algorithm which use custom arithmetic precision in large parts of the implementation without introducing error in the output. Mapping the two modified versions to an FPGA allows for more parallel modules to be instantiated in the same chip area. 2) An FPGA architecture for the Particle MCMC algorithm, along with a novel algorithm which combines Particle MCMC and Population-based MCMC to tackle multi-modal distributions. A proposed FPGA architecture for the new algorithm achieves higher datapath utilization than the Particle MCMC architecture. 3) A generic method to optimize the arithmetic precision of any MCMC algorithm that is implemented on FPGAs. The method selects the minimum precision among a given set of precisions, while guaranteeing a user-defined bound on the output error. By applying the above techniques to large-scale Bayesian problems, it is shown that significant speedups (one or two orders of magnitude) are possible compared to state-of-the-art MCMC algorithms implemented on CPUs and GPUs, opening the way for handling complex statistical analyses in the era of ubiquitous, ever-increasing data.Open Acces

    Particle MCMC algorithms and architectures for accelerating inference in state-space models.

    Get PDF
    Particle Markov Chain Monte Carlo (pMCMC) is a stochastic algorithm designed to generate samples from a probability distribution, when the density of the distribution does not admit a closed form expression. pMCMC is most commonly used to sample from the Bayesian posterior distribution in State-Space Models (SSMs), a class of probabilistic models used in numerous scientific applications. Nevertheless, this task is prohibitive when dealing with complex SSMs with massive data, due to the high computational cost of pMCMC and its poor performance when the posterior exhibits multi-modality. This paper aims to address both issues by: 1) Proposing a novel pMCMC algorithm (denoted ppMCMC), which uses multiple Markov chains (instead of the one used by pMCMC) to improve sampling efficiency for multi-modal posteriors, 2) Introducing custom, parallel hardware architectures, which are tailored for pMCMC and ppMCMC. The architectures are implemented on Field Programmable Gate Arrays (FPGAs), a type of hardware accelerator with massive parallelization capabilities. The new algorithm and the two FPGA architectures are evaluated using a large-scale case study from genetics. Results indicate that ppMCMC achieves 1.96x higher sampling efficiency than pMCMC when using sequential CPU implementations. The FPGA architecture of pMCMC is 12.1x and 10.1x faster than state-of-the-art, parallel CPU and GPU implementations of pMCMC and up to 53x more energy efficient; the FPGA architecture of ppMCMC increases these speedups to 34.9x and 41.8x respectively and is 173x more power efficient, bringing previously intractable SSM-based data analyses within reach.The authors would like to thank the Wellcome Trust (Grant reference 097816/Z/11/A) and the EPSRC (Grant reference EP/I012036/1) for the financial support given to this research project

    Multilevel Delayed Acceptance MCMC

    Full text link
    We develop a novel Markov chain Monte Carlo (MCMC) method that exploits a hierarchy of models of increasing complexity to efficiently generate samples from an unnormalized target distribution. Broadly, the method rewrites the Multilevel MCMC approach of Dodwell et al. (2015) in terms of the Delayed Acceptance (DA) MCMC of Christen & Fox (2005). In particular, DA is extended to use a hierarchy of models of arbitrary depth, and allow subchains of arbitrary length. We show that the algorithm satisfies detailed balance, hence is ergodic for the target distribution. Furthermore, multilevel variance reduction is derived that exploits the multiple levels and subchains, and an adaptive multilevel correction to coarse-level biases is developed. Three numerical examples of Bayesian inverse problems are presented that demonstrate the advantages of these novel methods. The software and examples are available in PyMC3.Comment: 29 pages, 12 figure

    Population-Based MCMC on Multi-Core CPUs, GPUs and FPGAs

    No full text

    An Unbiased MCMC FPGA-Based Accelerator in the Land of Custom Precision Arithmetic

    No full text

    Multilevel Delayed Acceptance MCMC with an Adaptive Error Model in PyMC3

    Get PDF
    Uncertainty Quantification through Markov Chain Monte Carlo (MCMC) can be prohibitively expensive for target probability densities with expensive likelihood functions, for instance when the evaluation it involves solving a Partial Differential Equation (PDE), as is the case in a wide range of engineering applications. Multilevel Delayed Acceptance (MLDA) with an Adaptive Error Model (AEM) is a novel approach, which alleviates this problem by exploiting a hierarchy of models, with increasing complexity and cost, and correcting the inexpensive models on-the-fly. The method has been integrated within the open-source probabilistic programming package PyMC3 and is available in the latest development version. In this paper, the algorithm is presented along with an illustrative example.Comment: 8 pages, 4 figures, accepted for Machine Learning for Engineering Modeling, Simulation, and Design Workshop at Neural Information Processing Systems 202
    corecore